Consonant confusion structure based on machine classification of visual features in continuous speech
نویسندگان
چکیده
This study is a first step in selecting an appropriate subword unit representation to synthesize highly intelligible 3D talking faces. Consonant confusions were obtained with optic features from a 320-sentence database, spoken by a male talker, using Gaussian mixture models and maximum a posteriori classification methods. The results were compared to consonant confusions obtained from visual-only human perception tests of non-sense CV syllables spoken by the same talker. At the phoneme level, machine classification results for the continuous speech database had worse performance than human perception with isolated syllables. However, the number of distinguishable consonant clusters by machine is equal to that by humans. To model the optic feature for continuous visual speech synthesis, the results suggest that for most consonants, modeling optic feature in phoneme level is more appropriate than modeling in phoneme clusters determined from visual-only human perception tests. For some consonants, modeling in a context-dependent manner might be helpful in improving the modeling accuracy for the talker studied in this paper.
منابع مشابه
Improving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملP65: Speech Recognition Based on Bbrain Signals by the Quantum Support Vector Machine for Inflammatory Patient ALS
People communicate with each other by exchanging verbal and visual expressions. However, paralyzed patients with various neurological diseases such as amyotrophic lateral sclerosis and cerebral ischemia have difficulties in daily communications because they cannot control their body voluntarily. In this context, brain-computer interface (BCI) has been studied as a tool of communication for thes...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملConsonant Classification using Decision Directed Acyclic Graph Support Vector Machine Algorithm
This paper presents a statistical learning algorithm based on Support Vector Machines (SVMs) for the classification of Malayalam Consonant – Vowel (CV) speech unit in noisy environments. We extend SVM for multiclass classification using Decision Directed Acyclic Graph Support Vector Machine (DDAGSVM) algorithm. For classification, acoustical features are extracted using Wavelet Transform (WT) b...
متن کامل